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SUMMARY 



It is shown that the prediction intervals derived by Hahn [2,3,4] and 
Hickman [5] for the mean, standard deviation as well as all the obser- 
vations of a future random sample based on an earlier informative random 
sample are valid even when the sample observations are correlated and 
have a specified correlation structure such as interclass correlation. 



1. Introduction 



Suppose that a random sample of size N = n^ + n^ is drawn from a 

2 

normal distribution N(y,a ) . The subsample consisting of the first n^ 
observations will be called the initial sample and the remaining n^ ob- 
servations is called the future sample. Hickman [5] considered the 

problem of obtaining forecast intervals for the mean and variance 

2 — 

S„ of the entire sample of size N based on the mean X and variance 
N n 0 

2 

S of the initial sample; the end points of the forecast intervals for 

n~ 



X^ and depend only on the observations of the initial sample via 

X and S . Hahn [2,3,4] derived prediction intervals for the mean 

n o n o 

and variance of the second sample as well as a simultaneous prediction 
interval to contain each of the n^ observations of the second sample. He 
also considered the problem of constructing simultaneous prediction inter- 
vals for the variances of each of k additional random samples of size 
n^ based on an informative random sample of size n^ . In this paper we 
show that the results of Hickman and Hahn for random samples are valid even 
where the sample observations are correlated and have a special correlation 
structure such as interclass correlation. As an example, samples with inter- 
class correlation occur in the study of random effects models in Analysis of 
Variance. 



2. Notation And Basic Results 

Let X_^ , j = 1,2, ...,n^ , i = 0,1,2, ...,k be (k + 1) sets of 

2 

random samples of size n^ from a normal distribution N(y,a ) and let 
k 

N = £ n. . The means and variances of the (k + 1) sets of random 

i=0 1 



2 



samples and the pooled sample of 



N observations are 



n. 

X. = — l X. . 
1 n i j=l 



S 2 = — l (X. . - X. ) Z i = 0,1, ... ,k 
i n. , L . ij i 



n. 

l 



- v 2 



i 1 = 1 



n k n i 9 n k n i 0 

x - ; I Ix„ S 2 - i I I (X - i ) 2 

i=0 j=l J 



N i=0 j=l ij 



It is convenient to use matrix notation and express the sample variances 
as quadratic forms, in deriving the results. Let the symbols I_ and _E 
represent the identity matrix and a matrix all of whose elements are unity 
respectively. Also, define the vector X by 



- (X 01 ,X 02 ,, ”’ X 0n 0 ,X 10’ X 12 ,, ' ,X ln 1 ’ " ‘ ,X kl ,X lc2 ! 



■v 



Then, 

NS 2 = X'B X 

and 

? 

n.S. = X f B. X i = 0,1,2, . . . ,k 

li i — 

where _B = I_ - N ^ E. 

NxN N*N 

th 

and the J3^ are NxN block diagonal matrices with the i diagonal 
block equal to 

I - nT 1 E 
— 1 — 
n.xn. n.xn. 

li li 



and the rest of the blocks being zero matrices. It is easily shown that the 

2 2 

matrices JB and are idempotent and that X f _B X/a and X'^i X/a 
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have chi-square distributions with (N-l) and (n^-1) degrees of freedom 
(see Rao [10, Chapter 3]. Further, the quadratic form X' BX can be partioned as 



77,2 



X'B X - l X'B.X+ l n. (X. - X) 
i=0 i=0 1 1 



It follows from an application of Hogg and Craig* s Theorem [6, Chap. XIII] that 
2 

X*_B.X/a i = 0,l,2,...,k are mutually independent chi-square variates 
1 k - - 2 

and that £ n. (X. - X) = X'JL .i X also has a chi-square distribution 

, » 1 1 X 

1=0 

with k degrees of freedom; as a consequence the matrix _B ^ a ^ s0 
indempotent . 

The construction of a simultaneous prediction interval to contain the 
variances of k sets of future samples is based on a statistic whose distri- 
bution is known as the studentized largest (smallest) chi-square distribution. 
Suppose Y is a chi-square random variable with degrees of freedom, 

Y , Y , . . . , Y are identically distributed as chi - square 
1 Z ra 

with V- degrees of freedom and Y.,Y ,Y ,...,Y are mutually independent. 

1 0 1 Z m 

Then, the distribution of 



W, 



min(Y- , Y , . . . ,Y ) 
± Z m 



and 



W T 



max(Y , Y , . . . ,Y ) 
1 Z m 



are known as studentized smallest and largest chi-square distributions re- 
spectively. These distributions depend on three parameters v q ,V 1 anc ^ m 

and Krishnaiah and Armitage [7,8] constructed tables of percentage points 

th 

for the two distributions; the lOOy percentiles of the distributions will 
be denoted by w L (v Q , v^m.y) and W u( v o ,v l’ m ’ y) . 



4 



The multivariate t distribution is used in deriving a simultaneous 
prediction interval for each observation in a future sample of size n^ 
based on a prior sample of size n^ . Let Z, = (Z^, Z^, . • . , Z^) 1 be dis- 
tributed as multivariate normal with zero mean vector and covariance 

2 

matrix _A ; the diagonal elements of A_ are all equal to a and all 

2 2 2 

the off-diagonal elements are equal to pa . If S /a is a chi-square 

variate with v degrees of freedom distributed independently of Z^ , then 

the joint distribution of t_,t oJ ...,t where t. = + [V~ Z./S is known as 
J 1 2 p i V i 

the central p-variate t distribution. Krishnaiah and Armitage [ 9 ] 

tabulated the values t (v,p,y) such that 

P 

p [t ± ^ t (v,p,y) , i = 1,2,. ..,p] = y 

for various choices of p, v and p . The percentage points t(v,y) and 
F(Vi»V 2 >y) of the student’s t distribution and the F distribution are 
also used in constructing some of the prediction intervals. 

A Theorem due to Baldessari [ 1 ] used in extending the results of 
Hickman and Hahn is stated below. 

Baldessari Theorem: Let X have a multivariate normal distribution 

n*l 

N(y_,V) where jj = (y^,...^) 1 and V is a positive definite matrix 
and let be idempotent matrices satisfying 

k -1 
l B. = I - n E . 

j=0 ^ nxn nxn 

A necessary and sufficient condition for X'jJjX/ 01 j = 0,l,2,...k to be 
independent and have chi-square distributions with degrees of freedom 
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r. = rank B. is that the covariance matrix V have the form 
3 “3 ” 



V = (A + A') + a (I - E) (2.1) 

n*n 



where 




and a and a_^ are positive constants. A covariance matrix with the 
structure defined in Baldessari Theorem occurs in the study of the variance 
component model 



Y . . = y + a, + e. . 
13 i 13 



3 1, 2, . . . ,n ; i 1 , 2 , . . . , k 



2 2 

where y is a constant, a_^ are i*i*d N(0,a^ ) , e^ are i*i*d N(0,cr ) 

and and e_^ are mutually independent. In this case, it can be shown 

that Y. = (Y ,Y ,Y . )' has a multivariate normal distribution with 
— l xl i2 in 

mean ^ = (y,y, . . . ,y ) 1 and covariance matrix 

2 , 2 2 2 

a +o a ... a 

a a a 



2 2 2 2 
a a +cr • . . o 
a a a 



2 , 2 
o +o 

a 



To see that V has the same form as in Baldessari Theorem let 



a l = a 2 = 



2 . 2 

= a = a + a 
n a 



= a 
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and 



a 



3. Prediction Intervals for Sample Variances 



Suppose X^., ,X- oJ . . . ,X n is an initial sample and X. - ,X. 0 , . . . ,X. , 

01 0^ un. . il in , 

0 k i 

i = 1,2, ...,k are k sets of future samples. Let N = \ n. and 

i=0 1 

X= (X^-,...,X , • • .X. 1 9 • • ) ! . It is assumed that X is distributed 

01 0n 0 *kl -kr^ n7n 

as an N-variate normal with mean vector = (y,...,y) r and covariance 

matrix V_ as in (2.1). The problem is to construct a simultaneous prediction 

2 

interval to contain each of the sample variances S_^ , i = l,2,...,k . As 

2 

indicated in Section 2, if S^ is the variance of the pooled sample then 



9 k 9 k _ _ 9 

NS = l " ,S. + l n (X - X) 

i=0 1 i=0 1 1 



or equivalently, in matrix notation 

k 

X’B X = 7 X’B.X + X’ II X 

"’ k+1 " 



(3.1) 



For random samples i.e., for = a I the variables / ot = X’jS^X/ a 

i = 0,l,2,...,k are distributed as chi-square and Q^/a , i = 0,l,2,...,k 

are mutually independent; by Hogg and Craig’s Theorem [ 6 ] ^+1^°* 58 — / a 

is chi-square distributed implying that IL is idempotent. 

“ ic+1 k+1 

Since in equation (3.1) the matrices B_ and B. satisfy (i) B = £ B. 

_L i=0 ~ 1 

and (ii) the matrices B_^ are idempotent by Baldessari Theorem Q^/a 

i = 0, 1, 2, . . . ,k+l are mutually independent chi-square variates even for 

correlated samples with covariance matrix V as in (2.1). 

2 

The prediction intervals for S^ , i = l,2,...,k are obtained as follows. 
If k = 1, the variates Q ^/a , Q^/a , C^/ce are independently distributed as 
chi-square with n^-l > n^-1 an d ^ degrees of freedom respectively. The 
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ratio (rig - l)Q^/(n 1 - l)Qg an F_c ^ s tribution . For a specified 



confidence coefficient 1 - y 



[ F ( n l-1, V l, *) < 1 - * )]- 1 



“ Y 



and thus 



V n l ^2 Y 2 n 0 (n l ^ 2 y 

— 7 — TV S n F(n n -l>n -1, ^ ) < S < ~r 1 S. F(n -1, n -1,1 ) 

L n l^ n 0~ 1 ^ 0 10 2 1 n^Cn^-l) 0 1 0 2 



A prediction interval for the variance of the pooled sample is obtained 
by noting that (n^-l) (Q^ + Qq ^ aS an ^ distribution with n^ and 
n^ - 1 degrees of freedom. A 100 (l-y)% prediction interval for 

S N = (Q 0 + Q 1 + V /N is 



F(n i’V 1, 2> f 1 



% 2 

N S N < 



V F ( n i > n rT^> 1 - ^ ) + 1 

ng-1 10 2 



N 



2 2 

The two prediction intervals for and are exactly the same as the 

ones obtained by Hahn and Hickman for random samples. 

2 

For k > 1 , a simultaneous prediction interval for S. , i = l,2,...,k 



is derived by assuming that n_^ = n , i = l,2,...,k . The variate 



W, 



u 



,2i % V 

max ( , « . . . $ ) 

a a a 



<0 



has a studentized largest chi-square distribution with parameters n^-l > 
n-1 and k . Hence, 

QilWyCriQ-l , n-1 ,k,l-Y)Q 0 » i= l,2,...,k] = 1-y 



[ S i 1 "JT W U (n 0 _1 ’ n_1 ’ k> 1_Y)S 0 » i = !» 2 * ♦ • • = 1 “ 
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and 



This simultaneous interval is an extension of Hahn f s results. Lower 



bounds and two sided bounds may be obtained using the studentized 
smallest chi-square variate . 



4 . Prediction Intervals For The Observations In A Future Sample . 

Let ,X ,-»X ,X be a sample of size N = n+m 

12 n n+1 n+2 n+m 

such that X = {X^ y X^ y . . . > x n ,X n+l > X n+2’ ‘ * * ? ^ as a mu l t: ^ var: ^ ate normal 

distribution with mean y = (y,...,y) f and covariance matrix V_ as in 
— 2 — 2 —2 

(2.1). Let (X^,S^ ), (X^> ) and (X^, ) denote the mean and 

variance of the first n observations, the second set of m observations 
and the pooled sample of N observations respectively. 

For m = 1 , that is, N = n+1 



(n+1) S 



2 

n+1 




+ * <X n + l 




2 



or as quadratic forms 
X 1 B X = x 1 !^ + X'J^X 



By the use of Baldessari theorem it can be concluded that 

_ srl < Vr x n )2 

X'B X n+1 2 

J- J 

n 



is distributed as F with 1 and n-1 degrees of freedom. Thus, 



X - 
n 



(n+l)S 2 F ( 1 , n- 1 , y ) n 1/2 
n-1 



< X . . < X + 

n+1 n 



(n+1) S F(l,n-l,y) 
n 

n-1 



= 1 - y 
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1/2 



If m > 1 a prediction interval for the mean of the second sample 

can be obtained following the same procedure as above, that is, starting 

2 

with a partition of the sum of squares NS ^ . The resulting interval 

would be exactly the same as the one obtained by Hickman for random samples. 

To construct a simultaneous prediction interval for X , , X , X.. 

n+1 n+2 n+m 

let Z. = X , . - X i = 1, 2, . . . ,m . Then, Z = (Z- ,Z 0 , . . . ,Z ) 1 can 

l n+i n —12m. 

be expressed as Z = C/ X where the matrix C* is defined by 
mxl m^N N X 1 

-1 = ( ' n 1 i 1 ) 

m><N \ mxn • mxm / 



It follows that _Z is distributed as multivariate normal with zero mean 
vector and covariance matrix (see appendix for computations) 



_C ! V C = ot( - E + I ) 
mxm 



nS. 



Also , _Z = C'X and 



X'j^X 



which is chi-square distributed 



are statistically independent since C/ V B = (see appendix for compu- 
tations) which is a sufficient condition for independence (see Rao [10, Chap. 3]). 
Let 

L. A - A 

~i n-+- I n 

i 1,2,. ..,m . Then , 



W. = 

l 



r . 1 x ,1/2 

[a(l + - ) ] 
n 



X , - X 

n+1 n 

r /-li 1 \ 1 1/2 

[a(l + - ) ] 



W ,W , . . . ,W are jointly distributed as an m-variate normal with zero 
l z m 

mean vector and covariance matrix 
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Define the variates 



t . 
1 




X -X 

n+i n 



v 



n-l 

n+1 



i 1 , 2 y • • • y m 



The joint distribution of t 0 , . . . , t is the central 

2 m 

m-variate t distribution with parameters (n-l) and p = . If 

1 Y Y 

t (n-l, — r , 1 - ) is the upper (1 - )th percentage point of the 

m n~rl Z Z 

m-variate t distribution then 



P [ X n 



t (n-i, 
m 



1 

n+1 




S < X 



n+i 



< 



X + t (n-l, 
n m 



1 

n+1 





for i=l, 2, . . .m 



This result is identical to the prediction interval of Hahn for random 
samples. 
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APPENDIX 



Suppose X = (X.. ,X„, . . . ,X ,X ,X X , ) ' is a sample of 

— 1 Z n rrr± n+z n~rm 

size N = n+m that is jointly distributed ys multivariate normal with 
mean vector and covariance matrix 



V = (A + A' ) + a (I - E) 
NxN 




and a and a 



^ are positive constants. Let Z = 



Z. = X . . - X , i = l,2,...,m . Then, if 
1 n+i n 



Z ) 1 where 
m 



C' = 
mxN 




JE 

mxn 



I 

m*m 



z = c f x 



The joint distribution of Z-,Z 0 ,...,Z is multivariate normal with mean 

12 m 

1 g *jQ 

C'y = 0 and covariance matrix C'V C = a( ~ — — ) as shown below: 

“ — nmmmm 

Partition the matrix A as 




13 - 



Note that in each of the matrices A^ , A,, > A^, A^ the columns are identical 
and all elements in each row are the same. 




Hence, 



C'V C 



1 

2 



(- - + a ,JE 
n n+1 

(" - + a )E 
n n+2 — 



<- s + 




/ 1 E 
= a( — — 

n mxm 



Further if S' 



+ I ) 

mxm 

i n 

= 1 l (X. 

n i=l 1 



X n > 2 



then 



14 - 



i — 1 1 CM 



where 



nS = X'lBjX 



% 




The quadratic form X f .B X and the linear form _Z = are independent 

r 



since 



C' V B 



<- i + <w > i 



(- - + a . , ) E 
n . n+2 — 



<- i + > £ 






+ a(- - E I I ) 
n . 



> 
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